16 research outputs found

    Lifeguard: Local Health Awareness for More Accurate Failure Detection

    Full text link
    SWIM is a peer-to-peer group membership protocol with attractive scaling and robustness properties. However, slow message processing can cause SWIM to mark healthy members as failed (so called false positive failure detection), despite inclusion of a mechanism to avoid this. We identify the properties of SWIM that lead to the problem, and propose Lifeguard, a set of extensions to SWIM which consider that the local failure detector module may be at fault, via the concept of local health. We evaluate this approach in a precisely controlled environment and validate it in a real-world scenario, showing that it drastically reduces the rate of false positives. The false positive rate and detection time for true failures can be reduced simultaneously, compared to the baseline levels of SWIM

    Assessing the Amazon Cloud Suitability for CLARREO's Computational Needs

    Get PDF
    In this document we compare the performance of the Amazon Web Services (AWS), also known as Amazon Cloud, with the CLARREO (Climate Absolute Radiance and Refractivity Observatory) cluster and assess its suitability for computational needs of the CLARREO mission. A benchmark executable to process one month and one year of PARASOL (Polarization and Anistropy of Reflectances for Atmospheric Sciences coupled with Observations from a Lidar) data was used. With the optimal AWS configuration, adequate data-processing times, comparable to the CLARREO cluster, were found. The assessment of alternatives to the CLARREO cluster continues and several options, such as a NASA-based cluster, are being considered

    Supporting iteration in a heterogeneous dataflow engine,

    No full text
    Abstract Dataflow execution engines such as MapReduce, DryadLINQ, and PTask have enjoyed success because they simplify development for a class of important parallel applications. These systems sacrifice generality for simplicity: while many workloads are easily expressed, important idioms like iteration and recursion are difficult to express and support efficiently. We consider the problem of extending a dataflow engine to support data-dependent iteration in a heterogeneous environment, where architectural diversity introduces data migration and scheduling challenges that complicate the problem. We propose constructs that enable a dataflow engine to efficiently support data-dependent control flow in a heterogeneous environment, implement them in a prototype system called IDEA, and use them to implement a variant of optical flow, a wellstudied computer vision algorithm. Optical flow relies heavily on nested loops, making it difficult to express without explicit support for iteration. We demonstrate that IDEA enables up to 18× speedup over sequential and 32% speedup over a GPU implementation using synchronous host-based control

    Dandelion: a compiler and runtime for heterogeneous systems,”

    No full text
    Abstract Computer systems increasingly rely on heterogeneity to achieve greater performance, scalability and energy efficiency. Because heterogeneous systems typically comprise multiple execution contexts with very different programming abstractions and runtimes, programming them remains extremely challenging. Dandelion is a system designed to address this programmability challenge for data-parallel applications. Dandelion provides a unified programming model for heterogeneous systems that span a diverse array of execution contexts including CPUs, GPUs, FPGAs, and the cloud. It adopts the .NET LINQ (Language INtegrated Query) approach, integrating data-parallel operators into general purpose programming languages such as C# and F# and therefore provides an expressive data model and native language integration for user-defined functions. This enables programmers to write applications using standard high-level languages and development tools, independent of any specific execution context. Dandelion automatically and transparently distributes the data-parallel portions of a program to the available computing resources, including compute clusters for distributed execution and the CPU and GPU cores of individual compute nodes for parallel execution. To enable the automatic execution of .NET code on GPUs, Dandelion crosscompiles .NET code to CUDA kernels and uses a GPU dataflow runtime called EDGE to manage GPU execution. This paper describes the design and implementation of the Dandelion compiler and runtime, focusing on the distributed CPU and GPU implementation. We report on our evaluation of the system using a diverse set of workloads and execution contexts
    corecore